在本文中,我们通过使窗口长度成为可通过梯度下降来优化的连续参数,而不是经验调谐的整数值为值的超参数来重新审视频谱图的使用。此时,该贡献主要是理论上的,但是将修改后的STFT插入任何现有的神经网络都很简单。在本地箱中心固定并且独立于窗口长度参数的情况下,我们首先定义了STFT的可区分版本。然后,我们讨论窗口长度影响垃圾箱的位置和数量的更困难的情况。我们说明了该新工具在估计和分类问题上的好处,这表明它不仅对神经网络也可能引起任何基于STFT的信号处理算法感兴趣。
translated by 谷歌翻译
Federated Learning (FL) is a machine learning paradigm that enables the training of a shared global model across distributed clients while keeping the training data local. While most prior work on designing systems for FL has focused on using stateful always running components, recent work has shown that components in an FL system can greatly benefit from the usage of serverless computing and Function-as-a-Service technologies. To this end, distributed training of models with severless FL systems can be more resource-efficient and cheaper than conventional FL systems. However, serverless FL systems still suffer from the presence of stragglers, i.e., slow clients due to their resource and statistical heterogeneity. While several strategies have been proposed for mitigating stragglers in FL, most methodologies do not account for the particular characteristics of serverless environments, i.e., cold-starts, performance variations, and the ephemeral stateless nature of the function instances. Towards this, we propose FedLesScan, a novel clustering-based semi-asynchronous training strategy, specifically tailored for serverless FL. FedLesScan dynamically adapts to the behaviour of clients and minimizes the effect of stragglers on the overall system. We implement our strategy by extending an open-source serverless FL system called FedLess. Moreover, we comprehensively evaluate our strategy using the 2nd generation Google Cloud Functions with four datasets and varying percentages of stragglers. Results from our experiments show that compared to other approaches FedLesScan reduces training time and cost by an average of 8% and 20% respectively while utilizing clients better with an average increase in the effective update ratio of 17.75%.
translated by 谷歌翻译
通常,用于训练排名模型的数据受到标签噪声。例如,在Web搜索中,由于ClickStream数据创建的标签是嘈杂的,这是因为诸如SERP上的项目描述中的信息不足,用户查询重新进行的,以及不稳定的或意外的用户行为。在实践中,很难处理标签噪声而不对标签生成过程做出强烈的假设。结果,如果不考虑标签噪声,从业人员通常会直接在此嘈杂的数据上训练他们的学习到秩(LTR)模型。令人惊讶的是,我们经常看到以这种方式训练的LTR模型的出色表现。在这项工作中,我们描述了一类耐噪声的LTR损失,即使在类条件标签噪声的背景下,经验风险最小化也是一致的程序。我们还开发了常用损失函数的耐噪声类似物。实验结果进一步支持了我们理论发现的实际意义。
translated by 谷歌翻译
Most graph neural network models rely on a particular message passing paradigm, where the idea is to iteratively propagate node representations of a graph to each node in the direct neighborhood. While very prominent, this paradigm leads to information propagation bottlenecks, as information is repeatedly compressed at intermediary node representations, which causes loss of information, making it practically impossible to gather meaningful signals from distant nodes. To address this issue, we propose shortest path message passing neural networks, where the node representations of a graph are propagated to each node in the shortest path neighborhoods. In this setting, nodes can directly communicate between each other even if they are not neighbors, breaking the information bottleneck and hence leading to more adequately learned representations. Theoretically, our framework generalizes message passing neural networks, resulting in provably more expressive models, and we show that some recent state-of-the-art models are special instances of this framework. Empirically, we verify the capacity of a basic model of this framework on dedicated synthetic experiments, and on real-world graph classification and regression benchmarks, and obtain state-of-the-art results.
translated by 谷歌翻译
我们考虑经典的1中心问题:给定度量空间中的n个点P,找到p中的点,最小化到P的其他要点的最大距离。我们研究了D维$ \中这个问题的复杂性。 ell_p $ -metrics和编辑和ulam度量串的长度d。我们的1中心问题的结果可以根据D分类如下。 $ \ bullet $ small d:我们提供固定维度$ \ ell_1 $指标中的1中心问题的第一线性时间算法。另一方面,假设击中集猜测(HSC),我们显示,当$ d =ω(\ log n)$时,没有子种式算法可以在任何$ \ ell_p $ -metrics中解决1中心问题,或者在编辑或ulam指标中。 $ \ bullet $大d。当$ d =ω(n)$时,我们将条件下限扩展到编辑度量标准中的1中心问题的子四分之一算法(假设量化SETH)。另一方面,我们给出了一个$(1+ \ epsilon)$ - ulam度量标准中的1美元逼近,运行时间$ \ tilde {o _ {\ epsilon}}(nd + n ^ 2 \ sqrt {d}) $。我们还通过允许近似或通过减小维度D来加强一些上述下限,而是仅针对列出所有必要解决方案的较弱的算法类别。此外,我们扩展了我们的硬度结果,以便在编辑度量标准中排除次级学习的1中位问题的亚级算法,其中给出了一组长度n的n个字符串,目标是在集合中找到一个字符串这最小化了集合中的其余字符串的编辑距离之和。
translated by 谷歌翻译